|
In statistics and machine learning, discretization refers to the process of converting or partitioning continuous attributes, features or variables to discretized or nominal attributes/features/variables/intervals. This can be useful when creating probability mass functions – formally, in density estimation. It is a form of discretization in general and also of binning, as in making a histogram. Whenever continuous data is discretized, there is always some amount of discretization error. The goal is to reduce the amount to a level considered negligible for the modeling purposes at hand. Typically data is discretized into partitions of ''K'' equal lengths/width (equal intervals) or K% of the total data (equal frequencies).〔 〕 Mechanisms for discretizing continuous data include Fayyad & Irani's MDL method,〔Fayyad, Usama M.; Irani, Keki B. (1993) , ''Proceedings of the International Joint Conference on Uncertainty in AI'' (Q334 .I571 1993), pp. 1022-1027〕 which uses mutual information to recursively define the best bins, CAIM, CACC, Ameva, and many others〔Dougherty, J.; Kohavi, R. ; Sahami, M. (1995). "(Supervised and Unsupervised Discretization of Continuous Features )". In A. Prieditis & S. J. Russell, eds. ''Work''. Morgan Kaufmann, pp. 194-202〕 Many machine learning algorithms are known to produce better models by discretizing continuous attributes. == See also == * Density estimation * Continuity correction 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Discretization of continuous features」の詳細全文を読む スポンサード リンク
|